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ABSTRACT 


With the continuing gain in computing power, bandwidth, and Internet popularity 
there is a growing interest in Internet communities. To participate in these communities, 
people need virtual representations of their bodies, called avatars. Creation and rendering 
of reahstic personalized avatars for use as virtual body representations is often too 
complex for real-time apphcations such as networked virtual environments (VE). Virtual 
Environment (VE) designers have had to settle for unbehevable, simplistic avatars and 
constrain avatar motion to a few discrete positions. 

The approach taken in this thesis is to use a fuU-body laser-scanning process to 
capture human body surface anatomical information accurate to the scale of mi lli meters. 
Using this 3D data, virtual representations of the original human model can be simphfied, 
constmcted and placed in a networked virtual environment. 

The result of this work is to provide photo realistic avatars that are efficiently 
rendered in real-time networked virtual environments. The avatar is built in the Virtual 
Reahty Modeling Eanguage (VRME). Avatar motion can be controlled either with 
scripted behaviors using the H-Anim specification or via wireless body tracking sensors 
developed at the Naval Postgraduate School. Eive 3D visuahzation of animated 
humanoids is viewed in freely available web browsers. 
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1. INTRODUCTION 


A. BACKGROUND 

In 1965 Gordon Moore, a cofounder of Intel Corporation, predicted that 
computing power would roughly double every 18 to 24 months [Ref. 1]. This prediction, 
known as “Moore’s Law”, has been remarkably accurate for nearly forty years, and there 
is every indication that it will continue to do so for the foreseeable future. Computer 
graphical power, however, has been surpassing even Moore’s Law, approximately cubing 
every 18 to 24 months [Ref. 2]. Internet bandwidth is experiencing an evolution as well, 
with ADSL/DSL (Asynchronous Digital Subscriber Line/Digital Subscriber Line) and 
cable modems becoming more commonplace in homes and offices across America. At 
the end of 2000, there were over two mi lli on American homes with DSL/ADSL and the 
growth rate is accelerating [Ref. 3]. 

With these computing power and Internet bandwidth gains, and the increasing 
popularity of the worldwide web, there is a growing interest in networked virtual 
environments (NVE). A NVE is a software environment in which multiple users interact 
with each other in real-time, even though these users may not be physically in the same 
room, or even on the same continent. These environments could consist of anywhere 
from two to thousands of people, possibly all interacting in the same environment. Some 
examples are business conferences, engineering design fomms, entertainment 
apphcations, distance learning and Department of Defense (DoD) simulations. [Ref. 4] 

When entering a NVE, each participant assumes a virtual persona, visually 
represented by an avatar, which includes a body sfructure model, motion model, physical 
model, and possibly many other characteristics depending on the apphcation [Ref. 4]. 
This thesis is an attempt to construct an articulated, anatomically accurate avatar and 
place it within a NVE, which may be viewed via freely available web browsers. The 
avatar is the result of a fuU-body laser-scanning process and is accurate to a scale of 
mi llim eters [Ref. 5]. Consideration has been given to graphical complexity and 
bandwidth requirements, with the final model being extremely efficient and usable on 
today’s computers. The avatar can be used in virtual apphcations. Avatar movement can 
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be controlled via pre-scripted movements, such as the VRML HAnim specification [Ref. 
6], or made to shadow the controUing person’s movement via real-time motion capture 
[Ref. 7]. 

B. MOTIVATION 

The motivation for this project is to provide virtual environment (VE) designers 
with the technology for “photo-realistic” avatars. These avatars are a rephca of their 
human source and are extremely life lik e since their movements are driven by either pre¬ 
scripted human animation data or actual human input, and are scaled to the user to which 
the motion data pertains. Three examples of possible apphcations are entertainment, 
collaborative meetings, and DoD areas of interest. Specific details regarding possible 
scenarios are now examined. 

I. Entertainment - A Presence and Analysis Tool 

Presence is defined as the feehng of “being there.” Imagine a home video game 
system where the input controller is no longer a joystick, keypad, mouse or other artificial 
device, but is instead the body of the gamer. The movement of the user’s physical body 
is translated into a digital representation by a motion tracking system. This 
representation is then connected to their on-screen avatar, which will then mimic every 
movement of the user. The result is that the user moves their body in the real world in 
exactly the same manner they want their virtual alter ego to move. For example, in a 
fighting game, the participant would be executing the moves in the real world, with the 
avatar mimicking their every movement and the virtual environment then responding 
appropriately. One can easily imagine such an interface for several different genres, from 
role-playing to virtual sports. 

Also in the realm of entertainment, but on a shghtly different tack is performance 
analysis. Consider the example of virtual golf. A user who is motion tracked can 
actually swing the golf club as they do in real hfe to play the game. Not only can they be 
entertained from a gaming perspective, but they can also be insfructed. The application 
could monitor their swing, and point out weaknesses and areas of improvement. They 
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could redo the shot under the exact same conditions, experiment with their swing, and 
observe the outcome. Also, their swing could be recorded and played back for useful 
self-analysis. 

2. Collaborative Meetings - Its All in the Body Language 

The world community has long recognized the need for face-to-face meetings for 
effective communication. Since a large part of the way humans communicate is via body 
language, interacting through text alone is not sufficient in many cases. Subtle nuances 
of behavior that could be vitally important can be missed without certain non-verbal cues. 

Streaming video has been used as one solution to this problem but remains an 
expensive solution, both in terms of bandwidth and computer hardware. Such traditional 
media is also lim ited in that the viewpoint is fixed. The viewer can only see the action 
from the angle it was recorded, and is helpless to view the scene from another angle if 
circumstance or preference dictate otherwise. Physical constraints may make it 
impossible or impractical to place cameras at certain vantage points. 

Avatars offer a more flexible alternative, with lower bandwidth requirements. 
Active participants are tracked, with their avatars mimicking their behavior in the virtual 
world. Thus, all of the participants can see a shrug, a shake of the head, or a hand placed 
to the chin in thought. Flexibihty is provided by complete virtual camera control. AH 
participants can place their viewpoint anywhere they wish and zoom in or out. 
Engineering and architectural design collaborative sessions, distance learning and 
business meetings are examples of this type of apphcation. 

3. DoD Areas of Interest - Cutting-Edge War Fighting and Training 

The DoD has long been the largest developer of large networked virtual 
environments, with the goal of training personnel more effectively and economically 
[Ref. 4]. Simulator Networking (SIMNET) [Ref. 8] and the Distributed Interactive 
Simulation (DIS) protocol [Ref. 9] are examples of DoD interest in this area. To date, the 
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use of human entities or dismounted infantry (DI) has been lim ited in most high- 
resolution virtual simulations [Ref. 10]. 

With the use of reahstic avatars, the mihtary simulation role may be expanded to 
include action at the individual soldier level, vice incorporating only large-scale troop 
movements. Networked virtual rehearsal becomes possible, thereby ehminating 
geographical separation difficulties between command and troops. Rehearsal can be 
done with less bandwidth and more securely than current methods. 

Currently, physical descriptions with accompanying distinguishing physical 
feature descriptions are part of every mihtary service record. This information could also 
include laser scan data. Such data would describe their appearance as well as their 
physical dimensions down to the mi llim eter. Besides being useful as a means of accurate 
identification, this data would also be available for use in creating personalized avatars. 
The scan output could be called up to render every member in 3D. One possible 
apphcation would be to drive their avatars with motion-tracking sensors. In this manner, 
commanders may view a battlefield simulation as it unfolds, with their view unlimited by 
physical constraints, either taking a “gods-eye” view or zooming in to one specific 
combatant according to their preference. The mission may also be recorded and played 
back for debriefing, and the playback camera position would not be lim ited to the original 
position when the data was recorded, thus giving a distinct advantage over current 
conventional recording and playback methods. 

C. OBJECTIVES 

The objective of this research is to constmct human rephca avatars for use in 
virtual environments using data obtained from a whole body laser scanning process. To 
achieve this objective, the following areas are addressed: 

• The complexity of laser scan data must be reduced so that the avatars may be 
rendered efficiently with current computing technology. 

• The data must be translated into one or more universal formats that are 
platform independent and wiU therefore mn under several different operating 
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systems. Optimally, the chosen file format wiU have open source code that is 
freely distributed. 

• The data obtained from the scanning process is a “data soup” in that it is a 
single figure with no segmentation. The output data must be organized to 
segment the body in order to provide for fuU articulation and realistic 
movement. 

• The avatar must be built from its body segment building blocks, be physically 
accurate and visually compeUing. 

• Avatar movement must be possible through scripted (pre-defined) motion, and 
also through real-time input, such as over a network from motion trackers. 


D. THESIS OUTLINE 

This chapter describes the background, motivation and objectives to be achieved 
in order to produce 3D avatar replicas from laser scan data. Chapter II contains a concise 
problem statement for this thesis .provides an overview of 3D human scanning 
technologies with an in-depth look at the 3D laser triangulation scanning method chosen 
for this research. Chapter HI provides an overview of 3D human scanning technologies 
with an in-depth look at the 3D laser triangulation scanning method chosen for this 
research. Chapter IV discusses the Virtual Reality Modeling Language (VRML), how 
VRML and Java work together, humanoid animation and human motion tracking. 
Chapter V discusses initial development efforts to include scan complexity and file 
format issues, organizing laser scan data into body segments, selection of a 3D rendering 
engine, scripted avatar behaviors, and communicating real-time motion tracking input 
over networks. Chapter VI provides thesis conclusions and recommendations for future 
or follow-on research. 
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n. PROBLEM STATEMENT 


A. INTRODUCTION 

This chapter defines the problem examined for this thesis and offers a proposed 
solution. Further, the focus of this research is discussed, and design issues that were 
considered during model implementation are addressed. 

B. PROBLEM STATEMENT 

With the explosion of the worldwide web, people from all over the world are 
interacting electronically with each other in ever-increasing numbers [Ref. 3]. 
Networked virtual environments (NVEs) provide one form of electronic interaction 
among humans. A NVE is a software environment in which multiple users interact with 
each other in real-time, even though these users may not be physically in the same room, 
or even the same continent [Ref. 4]. 

When entering a NVE, each participant assumes a virtual persona, called an 
avatar, which includes a graphical representation, body stmcture model, motion model, 
physical model, and possibly many other characteristics depending on the apphcation 
[Ref. 4]. While the fi lm industry has enjoyed much success digitizing humans, much 
processing time is required to create highly complex models that are unable to be 
rendered over networks with real-time interaction. Past solutions that have allowed real¬ 
time interaction have compromised on avatar quality, resulting in overly simplified 
models that reduce virtual reahty effectiveness by decreasing the user’s sense of 
presence. Virtual environment apphcations that require exact dimensions of the human 
body may also suffer, as simphstic avatars often bear httle resemblance to the original 
model in both appearance and measurement. Poorly sized models can result in lessening 
the user’s sense of presence, since the avatar’s li mbs may appear to go through virtual 
objects, including the avatar itself. Manual exercises, such as reaching out and 
manipulating an object become difficult if avatar dimensions do not equal the controlling 
human’s dimensions. 
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c. 


PROPOSED SOLUTION 


The proposed solution for these challenges is to develop a high-resolution, 
diniensionally accurate human model, or avatar with a realistic appearance. The model 
must be efficient enough to mn easily on today's computers, and scale well so that many 
avatars could be rendered simultaneously while maintaining a satisfactory frame rate. 
Further, avatar control through either pre-scripted actions or real-time updates via 
networking must be supported. Finally, the system must be platform independent to 
permit hardware and software flexibihty. 

D. RESEARCH FOCUS 

The focus of this research is to build a fuUy articulated human model from laser 
scan data for use as an avatar. The model must be simphfied, and then built using an 
international standard for networked humanoid animation. Humanoid Animation 
Specification 1.1 (H-Anim 1.1). The H-Anim 1.1 canonical exemplar Nancy.wri, written 
in Virtual Reahty Modeling Language (VRML), is used as an avatar foundation. Using 
H-Anim 1.1 and VRML provides the capacity for pre-scripted avatar control. 
Additionally, Java and VRML must be made to efficiently work together to provide the 
capabihty of real-time networked avatar control. Using Java and VRML ensures 
platform independence. The implementation must be able to accept quaternion inputs to 
be compatible with the Magnetic Angular Rate Gravity (MARG) motion tracking sensors 
developed at the Naval Postgraduate School. 

E. DESIGN CONSIDERATIONS 

The most significant design consideration is how to transform the "data cloud" 
obtained from laser scans into a fuUy articulated, segmented avatar. Multiple proprietary 
and non-proprietary data conversion methods must be examined before an 
implementation is chosen. The selected implementation first uses Cyberware 
Laboratories "Decimate" software package to reduce model complexity and provide file 
format translations, then uses "Maya" from AhasAVavefront for avatar segmentation. 
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Other major design considerations are: choosing which scanning method to use to 
capture body surface information; constmcting the avatar using Nancy.wri; implementing 
networking capabihty via DIS-Java-VRML; and providing for quaternion input for 
networked control. 

F. SUMMARY 

This chapter defines the problem addressed by this research and offers a proposed 
solution. The focus of this thesis is discussed, and design considerations are examined. 
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m. 3D SCANNING OF HUMANS 


A. INTRODUCTION 

This chapter provides an overview of the various 3D scanning technologies 
available at the time of the writing of this thesis. An in-depth discussion of human laser 
scanning follows, as it is the 3D scanning method chosen for this research. 

B. METHODS FOR 3D SCANNING OF HUMANS 

Although various 3D scanning methods have been available for the last two 
decades, recent advances in image sensing have increased their speed and accuracy 
tremendously [Ref. 11]. An overview of current human 3D scanning technologies 
follows. 

I. Stereoscopic Vision Scanners 

Stereoscopic scanning is a passive optical technique. Two or more digital images 
are taken from known locations. These images are then processed to find correlations 
between objects in the images. Figure 1 illustrates the principles of stereoscopic vision. 
Points a, b and c represent common features seen from two separate viewpoints. 



Figure I. Simple Optical Triangulation. 

Each viewpoint has both a focal distance and angle. The two different distance/angle 
combinations are used to calculate the distance to the common elements. Human 
binocular vision wor ks in the same manner. The further away the common elements, the 
more the two separate focal distance/angle pairs will agree. As the object moves closer 
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relative to the viewer, focal angles between the two vision sensors vary increasingly, 
making possible a distance estimate. 

In theory, this method requires very tittle hardware: at the minimum just two 
cameras and a computer to process the images. In reality, however, correlations between 
images are often difficult to ascertain, necessitating the use of special tight projectors that 
add to both system complexity and cost. Furthermore, with current computing power 
processing the photographic images can take ten minutes or more, and the resulting 
image quality is still vastly inferior to that obtained with laser-based systems. Image 
quality can be improved through the use of higher-resolution cameras, but at the expense 
of considerably longer processing times and hardware costs. [Ref. 12] 

2. Moire Projection Scanners 

In Moire projection scanning, a series of stmctured tight patterns are projected 
onto the object to be digitized. The shape of the object causes the base pattern to be 
distorted from its original design. By analyzing this distorted tight pattern the shape of 
the object is calculated, and x-y-z coordinates are then produced. Figure 2 below is an 
example of a typical Moire pattern on an object. [Ref. 13] 



Figure 2. Moire Light Patterns. 

Moire scanning has several disadvantages. The object must be tit solely by the 
capturing tight source, as bright ambient tight dismpts the contrast of the pattern and 
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interferes with the resolution of the scan [Ref 13]. Also, scan quality depends on the 
color of the object, possibly resulting in large scan errors and reduced resolution. 
Although some of the best Moire scanners can offer fidehty comparable to laser 
scanning, scan time is greatly extended and the units are more costly than their laser 
counterparts [Ref. 12]. 


3. Time-of-Flight (TOF) Scanners 

These scanners use a type of laser scanning based on the technique of Laser 
Imaging Detection and Ranging (LIDAR). Distance is measured by comparing the phase 
of the returned laser beam to the original, allowing for very accurate scanning of very 
large objects. Figure 3 graphically depicts the principles of time-of-flight scanning. [Ref 
14] 



Modulated Laser Signal 

Returned Laser Light 


Amplitude 
Converts To 
Intensity Image 


Measure Range ' ^ 

To Each Point In Image 


Figure 3. Time-of-Flight Laser Scanning. 

Unlike triangulation scanning, TOF scanning accuracy remains constant 
regardless of the distance from the scanner. This makes TOF scanners ideal for large 
objects. Unfortunately, only geometry information is captured and the object’s texture 
information is not acquired. Additionally, TOF scanning time is much slower than laser 
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triangulation. With current technology, TOF scanning is not practical for the fast 
digitization of small and medium-size objects. [Ref. 12] 

4. Laser Triangulation Scanners 

Laser triangulation is a stereoscopic technique that calculates distances to an 
object by means of a video camera and a laser light source. Figure 4 is a schematic 
representation of the laser triangulation process. A laser beam is reflected from a mirror 



Figure 4. Laser Triangulation Scanning. 


onto the object to be scanned. The laser light is scattered by the object and is picked up 
by the detector, in this case a video camera. Since the triangulation distance and 
transmitted tight angle are known, the distance to the object may be calculated from the 
received image using basic trigonometry. 

This method is fast when compared with other scanning methods. Whole body 
scan times are on the order of seconds, and are accurate down to the mi llim eter. Texture 
capture is also possible, allowing for highly detailed scan output. The end product is a 
life lik e, believable digital representation of the original object. 
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C. THE CYBERWARE WHOLE BODY SCANNING PROCESS 
1. System Background 

The whole body laser scanning platform chosen for this research was Cyberware 
Laboratory Incorporated model WB4 triangulation laser scanner. This platform was 
chosen primarily due to the superiority of laser triangulation scanning for avatar 
purposes. 

A complete body scan takes approximately 17 seconds, and captures both xy-z 
coordinates and surface textures. Figure 5 shows the Cyberware WB4 body scanner. 



1 


Figure 5. Cyberware Scanner Model WB4. From Ref. [5]. 

2. System Operation 

Four yellow scanning heads are used to provide redundant data overlap to 
minimize the possibUity of the subject inadvertently masking parts of their body from the 
lasers and thus preventing detection of those coordinates. These four scanning heads are 
mounted on vertical rads. A separate platform to support the subject allows for 
independent ahgnment of the heads. The scan begins with the heads at their topmost 
position. The heads travel down the rads, capturing body coordinate and Red-Green- 
Blue (RGB) texture information in one pass, which takes approximately 17 seconds. 
[Ref. 15] 


15 










The laser scan process is controlled via a computer graphics workstation. Scanner 
output is a “data cloud” of points, which form vertices for the rendering triangles. The 
resulting model is a single figure, without segmentation of any kind. It is in *.ply 
(AhasAVavefront) format, which is in the pubhc domain. This allows for easy file 
inspection, and for the possibihty of constmcting custom file translators. Cyberware has 
translators available that convert the scan data into the following file formats: 3D Studio, 
Digital Arts (SGI), DXF, IGES 128 NURBS, MOVIE.BYU (SGI), STE, SCR (SGI Mesh 
and Slice), ASCH, IGES (106, 110, 112, 124), Inventor, OBJ, Echo and VRME. [Ref. 16] 

D. SUMMARY 

This chapter discusses the methods, advantages and disadvantages of various 
technologies for 3D scanning of humans. Additionally, the whole body scanning system 
used for this research is examined in detail. 
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IV. RELATED WORK 


A. INTRODUCTION 

This chapter provides background on Virtual Reahty Modeling Language 
(VRML), and on how VRML and Java work together. Further, it examines the 
Humanoid Animation 1.1 Specification (H-Anim 1.1 spec) and its canonical example, 
Nancy.wri. Finally, various methods of human motion tracking are discussed, including 
a discussion of the inertial and magnetic limb segment trackers developed at the Naval 
Postgraduate School. 

B. VIRTUAL REALITY MODELING LANGUAGE (VRML) 

VRML provides a standard, platform-independent method of rendering 3D scenes 
across the Internet. It is a 3D scene description language for specifying virtual worlds. 
VRML supports both static and animated 3D/multimedia objects. VRML applications 
can imbed hyperlinks to many popular digital multimedia file types. The VRML 
specification is International Standards Organization (ISO) specification ISO/IEC 14772- 
1. Sample VRML output and source code is shown in Figure 6. 
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#VRML V2.0 utf8 
Group { 

children ( 

Viewpoint { 

description "initial view" 
position 6-10 
orientation 0 1 0 1.57 

Shape { 

geometry Sphere { radius 1 } 
appearance Appearance { 
texture ImageTexture { 
url "earth-topo.png" 

} } } 

Transform { 

translation 0 -2 1.25 
rotation 0 10 1.57 

children [ 

Shape { 

geometry Text { 
string [" Hello" "world!"] 

appearance Appearance { 
material Material { 
diffuseColor 0.1 0.5 1 

} } } 

] 

} 

] 

} 



Figure 6. Example of Simple VRML Source Code and Output. From Ref. [17]. 

VRML scenes are constructed using nodes. These nodes are organized in a 
hierarchical fashion into a directed acychc graph, or scene graph. VRML files end with 
*.wrl, or *.wrz if the file is gzip-compressed. The main method for the user to interact 
with a VRML world through a browser is via point and click. Thus VRML world content 
can contain embedded links just like traditional HyperText Markup Language (HTML). 
Typically VRML viewers, or browsers, are installed as plug-ins into popular 2D web 
browsers. [Ref. 17] 


Four main components may be contained in a VRML file: the VRML header; 
Prototypes; Shapes (geometry and appearance). Interpolators, Sensors, Scripts; and 
Routes [Ref. 18]. Of these components, the only one required is the VRML header. 
Prototypes (PROTOs) are a powerful feature that provide for user-defined nodes, 
significantly increasing language extensibUity. PROTOs may be combined into hbraries, 
and are referenced using an external prototype (EXTERNPROTO) command, allowing 
for extensive code reuse. Shape nodes can contain both geometry and appearance nodes. 
Geometry nodes contain information on how the 3D object is constructed, and may be 
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primitives such as a cylinder, cone, cube, or sphere, or may be text and indexed face sets. 
Appearance nodes describe how 3D objects look, and may include the object's color, 
texture, and transparency level. Interpolators allow for key frame animation. Sensors are 
the means by which the user interacts with the virtual world. Script nodes provide an 
interface for the VRML world to interact with a program script, such as Java or 
JavaScript. This abihty for VRML to connect with powerful programming languages 
provides much flexibihty, and is cmcial for performing complex animations and network 
communication. Finally, routes define connections between nodes and fields, allowing 
for pre-defined events to be passed along the route to initiate program actions or 
animations. 


C. VRML AND JAVA WORKING TOGETHER 

By combining the authoring abihties of VRML with the programming resources 
of Java, a powerful hybrid is created that is more than the sum of its parts. Simple 3D 
content creation can be married with complex a nim ated behaviors to give intricate 
results. 


VRML and Java communicate via Script nodes, which contain Java functionahty. 
Script nodes appear in the VRML file, and allow for connecting Java variables to ^RML 
fields. Java classes must import vrml.* class hbraries contained in the DIS-Java-VRML 
package in order to provide type conversions between Java and VRML. To interface 
properly with the VRML browser, Java classes used by Script nodes must extend the 
vrml.node.Script class. The basic Script Node interface is shown in Figure 7. [Ref. 17] 


Script { 

exposedField MFString url 

[] 

Script iKxle is used to pn>gram behavior 
in a scene. Script nodes tvpically 

field 

SFBool directOutput 

FALSE 

a. 

signify a change or user action; 

field 

SFBool mustEvaluate 

FALSE 

b. 

receive events from other nodes; 

# And any 
event In 

number of: 

eventType eventName 


c. 

contain a program module that 

field 

fieldTvpe fieldName initialValue 


performs some computation; 

eventOut 

eventType eventName 


d. 

effect change .somewhere else in the 

} 



scene by sending events. 


Figure 7. Script Node Interface. From Ref. [19]. 
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The data type "exposedField" indicates that the associated variable has pubhc access, 
whereas the "field" data type provides private access to the respective variable. The 
exposedField data member "url" contains the location of the java class file. This location 
may be locally on the hard drive or the Internet. For robustness, several urls may be 
entered so if the browser cannot find the named file in the first location, it will 
automatically look in the next location. The fields "directOutput" and "mustEvaluate" are 
hints to the browser on how to optimize performance. If directOutput is set to FALSE, 
the script only passes events and does not modify VRML nodes directly. Conversely, 
when directOutput is TRUE the script has permission to modify VRML nodes via the 
respective fields. If mustEvaluate is EALSE, the browser may postpone updating for 
rendering optimization. Setting this value to TRUE forces the browser to update when 
fields are modified. The data types "eventin" and "eventOut" are events. Events are 
what provide VRML scenes their interactivity and fluidity. Events are time-stamped 
values of data types, and eventin data types must match exactly the eventOut data types. 
When a pre-defined event is triggered, the value of the variable is sent (along with a time- 
stamp) from the eventOut connection to the associated eventin connection. 

An example is shown in Eigure 8. Upon starting the VRML scene, the associated 
java class identified by the script node's "url" field is accessed, and its pubhc method 
"inihahzeO" is caUed automaticaUy. In this method, the fields passed by reference from 
the VRML file are connected to the "eventin". The programmer may also perform any 
initia liz ation that is deemed necessary, such as positioning or content changes. When the 
user activates the TouchSensor "ChckTextToTest" by chcking on the text with the 
mouse, an event and time-stamp is sent from touchTime's eventOut to the script node's 
eventin "startTime". This calls the script node's pubic method "processEvent". The 
programmer can then perform any java functionahty that is desired, and modify the fields 
passed in by reference accordingly. In this example, both the content and position of the 
text string is modified. [Ref. 17] 
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// references directly modify VRML scene 

> 


Figure 8. Example Script Node and Java Interaction. From Ref. [17]. 


D. THE HUMAN ANIMATION 1.1 SPECIFICATION 

The Humanoid Animation Working Group of the WebSD Consortium developed 
the H-Animation 1.1 spee. The working group states the following eharter [Ref. 20]: 

Our aim is to speeify a way of defining interchangeable humanoids and 
animations in standard VRML 2.0 without extensions. Animations include 
limb movements, facial expressions and hp synchronization with sound. 

Our goal is to allow people to author humanoids and animations 
independently... 

Although originally restricted to VRML 2.0, the working group's goal has grown to 
providing virtual humanoid form and behavior regardless of the authoring tool used, and 
allowing for the interchangeabihty of virtual humanoids. No assumptions were made 
concerning the appheation that would use the humanoids. One example of the 
specification's flexibihty is its appearance in High Level Architecture (HLA), where it 
has been developed as a Federation Object Model (FOM) [Ref. 21]. 

The H-Anim 1.1 spec has as its root a single Humanoid node. This node serves 
the following purposes: 
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• Stores human-readable data about the humanoid such as author and copyright 
information. 

• Provides a top-level Transform field for positioning the humanoid in the 
environment. 

• Stores references to all the Joint, Segment and Site nodes. 

• Serves as a "wrapper" for the humanoid. 

Joint nodes are arranged in a strictly defined hierarchy. They may contain other 
joint nodes, or segment nodes. Segment nodes describe the portion of the body 
connected to the associated joint, and may contain Site nodes and Displacer nodes. Site 
nodes contain location information relative to the segment, and can be used for placing 
clothing, jewelry, or other items on the segment. Site nodes may also be used as a 
"manipulator handle" for inverse kinematics apphcations. Displacer nodes are simply 
grouping nodes, allowing the programmer to identify a collection of vertices as belonging 
to a functional group for ease of manipulation. [Ref. 20] 

The H-Anim 1.1 Spec defines the "at rest" position, which specifies aU joint 
rotations to be zero. Additionally, it specifies that the origin be located between the feet 
of the humanoid at ground level, and that the humanoid face the +z direction, with -i-y 
being up and -i-x to the left of the humanoid. Just as important, the specification provides 
naming conventions for 94 joints and their associated segments, allowing for an 
extremely complex avatar (see Figure 9). 
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Figure 9. H-Anim Spec 1.1 Hierarchy. From Ref. [20]. 
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E. NANCY: THE H-ANIM 1.1 STANDARD 

Nancy.wri was chosen as the foundation on which to build the avatar used in this 
research, and is the canonical example of H-Anim 1.1. The author of Nancy is Cindy 
BaUreich, who grants permission for its non-commercial usage with proper credit and use 
of the 3Name3D name and logo. In the case of this thesis, Nancy was fundamentally 
modified such that maintenance of the 3Name3D name and logo would have proven 
challenging. Cindy BaUreich kindly granted permission for the use of Nancy for this 
research without the company name and logo. 

Nancy contains 17 joints, 15 segments and four default viewpoints. Four pre¬ 
scripted behaviors are included: stand, walk, mn and jump. The user chcking on the 
appropriate text inside of the VRML world activates these behaviors. See Figures 10-13. 



Stand Walk Run Jump 

/ 


Figure 10. Nancy Demonstrating the Stand Behavior. 
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Figure 11. Nancy Demonstrating the Walk Behavior. 



Figure 12. Nancy Demonstrating the Run Behavior. 
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Figure 13. Nancy Demonstrating the Jump Behavior. 


F. HUMAN MOTION TRACKING TECHNOLOGIES 

This section provides an overview of some of the human motion tracking methods 
available at the time of this research. The objective is not to undertake an exhaustive 
study of the field of motion tracking, but to discuss some of the more prevalent 
technologies and their respective lim itations. 

1. Mechanical Trackers 

Mechanical tracking is capable of not only tracking the movement of the user, but 
also of permitting the virtual environment to make itself felt through the use of haptic 
feedback. Since mechanical tracking is relatively accurate, much research has been done 
in using mechanical tracking as a calibration standard for various other tracking systems 
[Ref. 22]. Mechanical tracking can usually be categorized into of one of two forms: 
body-based and ground-based. [Ref. 7] 

Body-based mechanical tracking is performed by having the user wear a 
mechanical frame, or exoskeleton (see Figure 14). Angle measuring devices, called 
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goniometers, are located at exoskeleton joint locations. By measuring the joint angles of 
the exoskeleton, user limb orientation is obtained. 

One disadvantage of mechanical tracking is that since only the mechanical frame 
is tracked, errors are introduced if the exoskeleton shifts position on the body. Also, 
goniometer ahgnment with the joints is difficult. The goniometers are located externally 
to the joint, and are therefore dsplaced from the joint by some offset amount. This offset 
must be taken into account since it can introduce orientation errors. [Ref. 7] 

Another disadvantage of mechanical tracking is encumbrance. Not only must the 
user bear the weight of the exoskeleton, but it may also be impossible to obtain certain 
positions due to the size or shape of the device. These difficulties tend to detract from the 
user’s sense of presence, and severely l im it the scope of the user’s interaction with the 
virtual world. Although accurate and relatively inexpensive, mechanical tracking of 
several users in a single, shared volume is problematic due to both interference of the 
mechanical hnkages and lim ited range. [Ref. 7] 

2. Magnetic Trackers 

For real-time apphcations, magnetic tracking is the most prevalent. Possessing 
reasonable accuracy with httle or no obstmction problems, these relatively inexpensive 
systems track both segment position and orientation with body-mounted sensors that 
measure a spatially varying magnetic field. 

Since the systems are magnetic, they possess disadvantages common to any 
magnetic-field device. As sensor distance from the source increases, magnetic field 
strength decreases in power inversely with the square of the distance. This effectively 
limits the useful range of magnetic tracking, usually to less than ten feet. Additionally, 
orientation and position errors due to distortions of the spatial magnetic field increase 
with the fourth power as the source distance increases. This results in a non-constant 
error that varies according to sensor position and orientation relative to the source. 
Further, nearby metal objects can interfere, causing permutations and even obstructions 
of the magnetic field. Another disadvantage to magnetic tracking is latency. Vendor 
latency data varies enormously, depending on the apphcation. Finally, electrical 
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components generate their own magnetic field, which may induce noise and erratic 
magnetic field behavior. [Ref. 22] 

3. Optical Trackers 

Optical tracking is quickly catching up to magnetic tracking in terms of 
popularity. Currently, the main apphcation of optical tracking is animation requiring 
extensive off-line processing. It has been used in few real-time applications. The film 
industry reties on this technology almost exclusively, as it is highly reliable under 
controlled conditions. 

Since optical tracking depends on various tight sources, the systems are highly 
susceptible to interference from other tight sources near the same frequency. Also, since 
detection of optical sensors requires tine of sight (LOS) the systems are vulnerable to 
occlusion, making tracking of multiple people in a common work volume difficult. 
Further, some types of tight severely lim it the range of some optical tracking systems. 

Optical tracking systems fall into one of three categories. Image-based systems 
track position and movement by using multiple video cameras to track pre-selected 
sensors attached to the user. Pattern-recognition systems sense the distortion of a 
projected pattern of tight to track position and orientation. This is a motion analog to 
Moire scanning which was discussed earlier in this thesis. Structured light and laser 
systems have been promising, but thus far have not enjoyed the attention of researchers to 
the same extent as the other optical tracking systems. [Ref 7] 

4. Acoustic Trackers 

Acoustic, or ultrasonic trackers provide reasonable update rates and accuracies, 
and are less expensive than magnetic trackers. However, just as magnetic trackers were 
limited by the underlying physics of magnetism, acoustic trackers are lim ited by the 
physics of sound. Although ultrasonic systems have longer ranges than magnetic 
systems, they must maintain tine of sight making obstmction and shadowing a problem. 
The range of the system is dependent on wavelength. If wavelength is too short, acoustic 
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interference is minimized but range is minimized as well. On the other hand, if 
wavelength is too long, latency becomes unacceptable and distance resolution suffers. 
The middle frequency band that remains is susceptible to acoustic interference from 
metaUic objects, in addition to the echoes and reflections to which aU sound is vulnerable. 
[Ref. 24] 

5. Inertial and Magnetic Tracking 

Inertial and magnetic tracking is one of the newer motion tracking technologies. 
Although it has been used for tracking user head positions in various virtual reahty 
apphcations, until recently it had not been used for fuU body tracking. Professor Eric 
Bachmann at the Naval Postgraduate School developed one of the first platforms for 
using Magnetic Angular Rate Gravity (MARG) tracking as a fuU body motion tracking 
system. [Ref. 7] 

With recent advances in micro-machined and miniaturized technology, inertial 
tracking has become an affordable and accurate option. Unlike the other sensing 
technologies discussed, inertial trackers contain no inherent latency and therefore should 
be more accurate than their counterparts. 

With inertial tracking, angular rate data is integrated to determine segment 
orientation. If this data were used alone, error would be introduced over time as bias and 
drift errors accumulated. However with the addition of accelerometers to sense the 
gravity vector and magnetometers to sense the local magnetic field, the inertial signal can 
be corrected and the errors minimized. The MARG sensors developed by Bachmann 
contain a separate accelerometer, rate sensor and magnetometer for each coordinate axis. 

One drawback to the system developed by Bachmann is that only orientation is 
tracked, not position. Mobile platforms, such as submarines, integrate accelerometer data 
to obtain position, but their accelerometers are much larger and significantly more 
expensive. Currently, such techniques may only be apphed for short time periods with 
the small, low-grade sensors used for MARG tracking before drift introduces significant 
error. [Ref. 23] [Ref. 24] 
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G. SUMMARY 

This chapter discussed the Virtual Reality Modeling Language (VRML), and 
examined the powerful combination of VRML and Java working together. The 
Humanoid Animation 1.1 Specification and its canonical exemplar, Nancy.wri were also 
discussed. Finally, a brief overview of current human body motion tracking was 
provided, including the method chosen for this research, inertial and magnetic (MARG) 
tracking developed at the Naval Postgraduate School. 
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V. INITIAL DEVELOPMENT EEEORTS 


A. INTRODUCTION 

This chapter discusses the challenges involved in reducing the complexity of laser 
scan output, translating between file formats and partitioning the resulting data cloud into 
body segments. Constmcting an avatar from these body segments is then examined. 
Finally, the process of making Java and VRML work together, and incorporating 
networked, real-time control is discussed. 

B. HANDLING LASER SCAN DATA 

I. Reducing Laser Scan Output Complexity 

The first challenge of this research was to simphfy the laser scan data set. The 
raw output data consists of approximately 150,000 polygons. When translated into 
ASCn text, the size of the file is over 50 megabytes. This large data set is extremely 
unwieldy. If it could be rendered at all, use of a model of this nature in 3D worlds under 
current technology would be extremely inefficient, resulting in very slow frame rates. 
Figure 15 shows the original, unreduced laser scan output. 
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Figure 14. Initial Laser Scan Output of the Author (150,000 Polygons). 

For realistic rendering of humans in most applications, only 4000 to 5000 polygons are 
necessary. Texture mapping further assists in lowering the required polygon count, as the 
overlying texture adds greater surface detail. 

Laser scan output is in *.ply format, an AhasAVavefront file type. Since this 
format is open source, it is possible to write custom polygon reduction algorithms. As 
the scope of this thesis is avatar constmction and real-time avatar control, it was 
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considered beyond the scope of this research to create a custom algorithm for polygon 
reduction. Instead, a proprietary software package from Cyberware Laboratories called 
“Decimate” was used. The Decimate software can reduce model polygon count from the 
initial number of 150,000 to whatever the user specifies [Ref. 26]. For this research, a 
target polygon count of 10,000 polygons was used. Polygon reduction from 150,000 
polygons to 10,000 polygons was completed in less than 15 seconds on a Pentium IH, 
550-megahertz system. The resulting ASCII text file size was approximately 750 
kilobytes. 

2. Translating Between Files 

The *.ply file format is intended for animation software packages. For real-time 
rendering into 3D worlds a different file format is needed. Specifically, the Virtual 
Reality Modeling Language (VRML) format (*.wrl) was chosen for reasons discussed in 
Chapter III. 

As was the case for polygon reduction, custom translators could be written since 
*.ply is open source. Again, file translation was considered beyond the scope of this 
research. Additionally, the Decimate software used for polygon reduction includes 
several file translators, including VRML. The disadvantage in using the VRML 
translator that comes with Decimate was that original texture information is lost when 
converting from *.ply to *.wrl format, thus, if texture mapping is desired it must be 
supplied by the modeler. Figure 16 shows the reduced, VRML model obtained from 
translation. 
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Figure 15. Translated VRML Avatar (10,000 Polygons). 

Two things should be noted about Figure 16. The first is the laek of texture 
information, for the reason diseussed in the preeeding paragraph. The seeond item to 
note is that the figure is one eomplete piece. That is, the figure is not articulated or 
segmented in any way. The challenge of partitioning the avatar into appropriate body 
segments is discussed next. 

3. Segmenting The Avatar 

For a human model to be able to mimic the full range of motion of a human, it 
must be segmented in the appropriate places. Three approaches were considered for 
avatar segmentation. 
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The first approach that was considered is completely automated. If it is possible 
to infer the location of important body joints from the model data, then code may be 
written to extract such information and apportion body segments appropriately. One 
possibility is that when the individual gets scanned, they stand in such a way so that 
important joints are bent past some critical angle. The segmentation code could then look 
for sufficient change in the direction of surface normals, thereby indicating a bent joint. 
Since the initial laser scan was performed at the beginning of this research there were no 
preferences for scan position, and thus all joints are straight in the model. This being the 
case, completely automatic joint selection was not feasible. 

Partial automation was considered next. If average body segment lengths were 
known, it would be possible to insert joints automatically at the average locations. The 
drawback to this method is accuracy. Joint locations can vary widely from subject to 
subject, so models using this method would be susceptible to inaccurate segmentation, 
resulting in unbelievable avatars. Since one of the major objectives of this research is to 
provide reahstic, believable avatars this method of partial automation was discarded. 

The final method of avatar segmentation that was considered, and ultimately used 
in this research, was completely manual. An operator imports the laser scan data into 
SD-rendeiing capable software, manually selects the segments, and then exports the 
virtual body parts for ise as avatar building blocks. A disadvantage to this method is that 
it is time consuming. Several hours are required for an operator to segment a model. 
Another disadvantage is that model segmentation is somewhat arbitrary. One operator 
may segment a model very differently than another operator, especially if joint position is 
unclear, as in the initial scan. Lastly, this method requires operators trained in the use of 
whichever software package is selected. 

The software package used for avatar segmentation in this research was Maya, 
from AhasAVavefront. Maya has been used extensively in the fi lm industry to provide 
life lik e animation, and is adept at handling 3D objects [Ref. 27]. Maya can import and 
export *.obj files, allowing for segmentation processing. A file translator provided with 
Cyberware’s Decimate was used to convert the polygon-reduced laser scan from *.ply to 
*.obj format. The *.obj file was then imported into Maya, and 3D selection of body 
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segments was performed (Figure 17). After each body segment was selected, it was 
exported as a separate *.obj file. Unfortunately, at the time of this research Decimate did 
not contain translators to convert directly from *.obj to *.wrl (VRML), so it was 
necessary to first convert each body segment from *.obj to *.ply, then *.ply to *.wrl. 



Figure 16. Reduced Laser Scan Imported Into Maya. 


C. CONSTRUCTING THE AVATAR 

After segmentation in Maya, the model consisted of several VRML files, one file 
for each body segment. Each file contains VRML Shape nodes, with geometry data 
called an “IndexedFaceSet.” This geometry data contained x-y-z coordinates for each 
point to be rendered, along with indexing information indicating the order in which to 
render the points. Order is important because VRML supports back-face culling for 
efficiency. Back-face culling is an operation performed by rendering engines that only 
draws the external faces of objects. Since observer viewpoint is seldom concerned with 
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internal views of a “soM” object, drawing only the external sides can significantly reduce 
the processing workload, and result in higher frame rates. VRML, like most current 3D 
rendering engines, determines which sides are external and which are internal by the 
order in which points are drawn. 


The Humanoid Animation Specification 1.1 (H-Anim 1.1) canonical example 
Nancy.wri, created by Cindy BaUreich, was used (see Figures 10-13). Nancy’s segments 
were constructed using indexed face sets and indexing data. To constmct the laser- 
scanned avatar, the information (x-y-z coordinates and indexing data) from each of 
Nancy’s original segments was replaced with the corresponding information from each of 
the segment VRML files exported from Maya. Each of the new segments was scaled, 
and connected together by appropriate rotation and translation. The result is an 
articulated VRMl,/H-Anim 1.1 avatar originating from a laser scan, capable of scripted 
behaviors. As discussed earher, texture information is not present due to inadequacies 
with Cyberware Laboratory’s *.ply to *.wrl translator, so Nancy’s default colors were 
used initially as shown in Figure 18. 



Figure 17. Initial VRML/H-Anim 1.1 Avatar From Laser-Scan (Untextured). 
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Two things should be noted about Figure 18. The first is that there are no 
polygons above the hairhne. The physieal eharaeteiisties of hair yield a poor return 
signal during the laser-sean proeess, resulting in the loss of eoordinate information for the 
portions of the skuU eovered by hair. The seeond thing to note is the gap appearing 
between the right arm and the torso. Due to the posture of the human subjeet during the 
seanning proeess, the arms effeetively bloeked the laser signal from reaehing the left and 
right sides of the torso, resulting in loss of eoordinate information. Similar situations 
exist with other body segments, sueh as the legs. 

Both the hair-interferenee and segment-shadowing problems may be eompensated 
for using standard 3D editing teehniques, either using popular animation programs sueh 
as Maya or direetly by point editing in VRML. The segment-shadowing problem may 
also be rnitiimized by plaeing the human model in a posture that maximizes exposed 
surfaee area before the laser sean begins. 

A third problem with the avatar, whieh is not immediately apparent from Figure 
18, is one of joint visual eonneetivity. Sinee the avatar was obtained from a statie laser 
sean, when segments are moved from the initial sean position “tears” ean be seen at the 
joints where segments are eonneeted. For example, eonsider a human arm. When 
someone moves a forearm in the physieal world, the skin, museles and tendons streteh to 
aoeommodate the varying positions of the forearm relative to the upper arm. In the final 
avatar ereated by this researeh, when segments move relative to eaeh other the surface 
topology does not stretch or otherwise compensate for varying segment positions, 
resulting in visual tearing between segments during some avatar movements. 

For added realism, a standard green camouflage pattern was apphed to each 
avatar segment, with the exception of the head, hands and feet. For the head, a different 
process was used. 

3Q, incorporated [Ref. 28] speciahzes in optical triangulation scanning of the 
human face. They have booths in some software entertainment stores that perform 
digitization of faces. The digitized face can then be imported into a variety of popular 
computer games, and also contains a VRML rendition. This VRML output was used to 
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replace the existing avatar skuU obtained from the laser-triangulation scan performed by 
Cyberware Laboratories. 

The end product is a fully articulated, texture-mapped avatar that is capable of 
scripted movement via an international standard, H-Anim 1.1. See Figure 19. 



Figure 18. Texture Mapped, Articulated Avatar Capable of H-Anim 1.1 Scripted 

Movement. 

D. USING JAVA TO PROVIDE REAL-TIME NETWORKED CONTROL 

Although capable of scripted movement, the final product must also be 
controllable via network updates. Adding the open-source DIS-Java-VRML [Ref. 29] 
package to Java makes communication between VRML and Java possible. In this case, 
VRML renders the 3D scene and Java handles the networking. Refer to chapter 3 for an 
in-depth discussion on how VRML and Java can work together. 

The network protocol chosen for this implementation is the User Datagram 

Protocol (LDP). LDP is connection-less. Although not as rehable it is faster than 
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Transmission Control Protocol/Intemet Protocol (TCP/IP) [Ref. 30]. Strict packet 
accountability was deemed unnecessary for this research, since segment orientations are 
typically updated at approximately 100 hertz [Ref. 7]. 

Two UDP approaches were considered. The first approach examined was the 
built-in UDP fiinctionahty prowded by DIS-Java-VRML, which provides an easy-to-use 
UDP class called the Protocol Data Unit (PDU). Twenty-seven PDU fields are defined in 
the 1995 IEEE Standard for DIS-Application Protocols [Ref. 31]. Since the only 
information needed to be passed is a field containing segment identification, and four 
other fields containing orientation information the PDU was dismissed as being too 
heavyweight and therefore inefficient for this apphcation. The second approach involved 
custom packet design. Although this technique involved more coding and design, it was 
ultimately selected due to its superior efficiency, since a packet could consist of only five 
fields versus the twenty-seven fields contained in the PDU. 

The final code consists of the following files: JavaDutton.wri (the file containing 
the VRME content), greenCamo.jpg (contains the green camouflage texture map), 
clone.gif (the face picture to be texture mapped onto the avatar’s face), 
SciiptNodePieldControl.java (VRME/Java interface), ClientProgram.java (receives UDP 
segment orientation updates) and QuatToEuler.java (converts the quaternions received 
from the MARG sensors to Euler angles for VRME use). 

An additional class, ServerProgram.java, was written to support testing. 
Unfortunately, at the time of this writing the body tracking software developed at the 
Naval Postgraduate School only outputs to text files [Ref. 7]. In anticipation of network 
updates, the ServerProgram class simulates direct networking by parsing a pre-recorded 
body tracking session, wrapping the data into a UDP packet, and sending it over the 
network. When the ClientProgram class receives data over the network, it is unaware 
that the source was originally a text file. One drawback to this method is speed: parsing 
the data from a text file slows down the update process considerably, resulting in an 
animation playback that is an order of magnitude slower than animation being driven by 
pure network updates. 
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To run the networked VRML avatar, all of the java class files must be in the same 
directory as JavaDutton.wri, greenCamojpg and clone.gif. Assuming a VRML viewer 
plug-in has been installed in the web browser, the user double-chcks on JavaDutton.wri. 
The initial 3D scene is rendered, and SciiptNodeFieldControl is automatically called. 
ScriptNodeFieldControl accepts and initiali z es the avatar segment nodes along with a text 
message that is rendered in front of the avatar. ChentProgram is automatically called by 
ScriptNodeFieldControl as a separate thread, which then listens for UDP packet updates. 
When users are ready to receive network updates, they chck on the text in front of the 
avatar in the VRML scene. The text message changes, and body segment orientations are 
continuously updated from an array containing the most recent orientation data. For 
testing, ServerProgram was started, with a command line argument containing the 
filename of the pre-recorded body fracking data. ServerProgram parses the input file, 
calls QuatToEuler to convert quaternions to equivalent Euler angles, and sends the data 
over the network in the form of UDP packets. When ChentProgram receives a UDP 
packet, it unwraps the packet and cahs an update method in SciiptNodePieldControl, 
which then updates the array containing the most recent segment orientations. The next 
time the VRME scene graphics are refreshed, these most recent orientations are read from 
the array, thus updating the avatar’s motion. 


E. SUMMARY 

This chapter examined the process used to reduce the polygon count of the initial 
laser scan, translate the data into various file formats, and partition the data into body 
segments. AdditionaUy, avatar construction from these body segments was discussed. 
EinaUy, the process of making Java and VRME communicate with each other, and 
providing for networked, real-time control was examined. 
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VI. CONCLUSIONS AND RECOMMENDATIONS 


A. GENERAL THESIS CONCLUSIONS 

Construction of an articulated avatar from laser scans for use in 3D networked 
virtual environments has been achieved. The resulting avatar resembles the original 
human to a scale of mi llim eters and mns efficiently on current standard computer desktop 
systems. The 3D engine can be user and programmer-friendly, platform independent and 
open-source. Avatars can be driven by either scripted behaviors, or by real-time control 
via networks. 

B. SPECIFIC CONCLUSIONS AND RESULTS 

1. Constructing Anatomically Accurate Avatars 

The H-Anim 1.1 exemplar Nancy.wri, created by Cindy Bahreich, was used as the 
foundation and served as inspiration for this project. By replacing Nancy’s coordinate 
and indexing data with laser scan data, exact anatomical avatars can be constmcted that 
conform to an international human animation specification. 

2. Flexible Avatar Control 

Avatar control is possible by either programmed or real-time input. Programmed, 
or scripted, input follows the H-Anim 1.1 specification. Real-time input may be 
accomphshed over a network, with control devices sending UDP packets containing limb 
segment orientation updates. Specifically, real-time networked control via wireless 
motion tracking sensors developed at the Naval Postgraduate School was implemented as 
proof- of- concept. 

3. Source Code is Platform-Independent and Open-Source 

By restricting computer source code to VRML and Java, the final product 
produced by this research will mn on various platforms and operating systems via 
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popular web browsers. All source code is open-source, allowing for both inspection and 
enhancement as technology progresses. 

4. Simple 3D Authoring is Combined With Powerful Progra mmi n g 

Capability 

VRML is a high-level, easily understood 3D authoring language. Java is a 
powerful and widely used programming language. Highly complex results may be 
obtained when the two are combined, producing a product that is more than the sum of its 
parts. With gains in computing technology, both Java and VRML are approaching mn- 
time speeds previously enjoyed only by low-level programming languages, allowing easy 
creation of intricate scenes and behaviors through relatively simple interfaces. 


C. LESSONS LEARNED 

The laser scan of the author was performed at the beginning of this research. 
During this thesis it became clear that the scan pose was less than optimum. Not only 
were body segments masking other portions of the body from the laser signal, but also all 
of the hmb segments were straight, making it difficult to determine exact joint position 
during the segmentation process. With the knowledge gained during this research, it is 
recommended that future avatar scans be performed differently. First, lim bs should be 
positioned in such a way as to minimize masking other portions of the body from the 
laser signal. Second, all of the major joints should be bent as close to 90 degrees as 
possible. Not only does this provide clear indication of joint location for manual 
segmentation, but it also provides a clear division between lim b segments for possible 
automated segmentation. 

D. RECOMMENDATIONS FOR FUTURE WORK 

1. Automating Avatar Segmentation and Construction 

Segmenting the avatar into appropriate body segments for articulation, and then 
consfructing the avatar from these segments was by far the most time-intensive 
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component of this research. Advanced knowledge of third-party animation software, in 
this case Maya, was required. Since segmentation was done manually, accurate and 
consistent joint separations were both difficult and time-consuming. If the initial scan 
pose is modified as discussed earher, it should be possible to automatically determine 
joint location, based either on the relatively rapid change in surface normal direction or 
some other algorithm. These automatically generated segments could then be placed 
together using an avatar template. In this manner, articulated avatars could be 
constmcted in a few minutes instead of a few days, allowing for rapid content creation. 

2. Updating File Translators to Retain Texture Information 

During the whole body laser scan, texture information as well as xy-z coordinate 
information is obtained. Unfortunately, as discussed in Chapter V, the current file 
translators provided by Cyberware lose texture information when converting scan data to 
VRML. Cyberware is aware of this problem, and may resolve this issue in a later 
software release. Alternatively, since laser scan output is in open-source *.ply format, 
custom translators that properly retain texture information could be written. One possible 
approach would be to include custom file translation that retains texture information with 
the automatic avatar segmentation and constmction process discussed earlier. 

3. Keep Joints Connected in All Positions 

Since the avatar was created from a static model, visual tearing occurs when the 
relative positions between l im b segments are changed. This detracts from avatar realism 
and overall appearance. An important improvement would be to use displacers or meshes 
to keep limb segments cohesive through all ranges of motion. Some advanced features cf 
VRML support such techniques [Ref. 18]. 


4. Increase Behavior and Motion Libraries 

Since the capabihty of scripted avatar control is provided via H-Anim 1.1, an 
extensive library of ready-to-use behaviors would be of great benefit to virtual 
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environment designers. Depending on the target application, pre-existing complicated 
behaviors could be imported with tittle or no significant extra development time, 
allowing for more rapid and believable content creation. 

5. Update Existing Body-Tracking Code 

Currently the body-tracking code, written by Professor Eric Bachmann, can only 
record limb segment orientation updates to text files, and not to a network [Ref. 7]. For 
testing, this research parses a pre-recorded motion capture text file, and then sends update 
packets over the network. Modifying the body-tracking software to update to the 
network directly and thus eliminating file input/output could result in a significant 
increase in efficiency. Higher frame rates and virtual environments capable of supporting 
many more users would be possible. 

6. Construct Avatar in Other Programming Languages 

Open-source, platform-independence, and ease of authoring were major tenets of 
this research. Depending on the target application, programmers may have different 
goals. One way to meet different criteria is to constmct an avatar from laser scan data in 
various 3D programming languages. Some possible candidates are JavaSD [Ref. 32], 
OpenGL [Ref. 33] or DirectX [Ref. 34]. 


E. SUMMARY 

This research has demonstrated an efficient, cost-effective method of converting 
laser scan data into realistic, dimensionally accurate avatars. The avatars are open-source 
and platform independent, and can be controlled via either programmed behaviors or by 
real-time network updates. Real-time avatar control was developed using the Magnetic 
Angular Rate Gravity (MARG) sensors developed at the Naval Postgraduate School. 
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